Skip to content

feat: Cross-track upgrade documentation#174

Draft
MichaelThamm wants to merge 43 commits intomainfrom
feat/charmhub-module
Draft

feat: Cross-track upgrade documentation#174
MichaelThamm wants to merge 43 commits intomainfrom
feat/charmhub-module

Conversation

@MichaelThamm
Copy link
Copy Markdown
Contributor

@MichaelThamm MichaelThamm commented Jan 15, 2026

Blocked by:

Blocks:

Relates to:

Issue

We want a smooth UX for users upgrading across tracks for our product modules: COS and COS Lite. Ideally, a user should be able to take their track/2 state and plan an upgrade to track/3.0 without any manual intervention.

Solution

  1. Use the juju_charm datasource to provide the latest revisions for a charm in a specific track.
  2. Use lifecycle {replace_triggered_by = [terraform_data.grafana_ingress_interface]} to replace the juju_integration in the event that the interface changes.

Checklist

Context

In the future charms will have unique tracks which the products needs to map to:

Testing Instructions

See this comment for details:

The general idea is to:

  1. Deploy COS 2/stable
  2. Update TF module source to dev/edge
  3. terraform init -upgrade; terraform apply

Documentation

See the CI job for the documentation changes

Comment thread terraform/charmhub/main.tf Outdated
@@ -173,6 +173,8 @@ $ juju deploy cos-lite \
--overlay ./storage-small-overlay.yaml
```

(deploy-cos-ref)=
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iirc we haven't started using these our docs yet.
Remind me - it's an anchor or cross-sphinx ref?
I think juju had this and it turned out to be brittle and difficult to maintain, but maybe I'm missing something.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an anchor, but it can also be used as a cross-sphinx/project ref for projects that have that enabled.

You haven't started using them in the COS docs yet, but it is the standard approach to cross-linking for sphinx docs projects, so at some point, you should change your links to use these refs. Juju uses them in their docs, but I don't know what conversations happened around it

Using ref targets/anchors is nicer because otherwise anytime you change the filename/path, it'll break any of those links in your docs.

IMO, it could go either way in this PR (add it or don't add it), but the future goal should be that all docs have a ref target, and you use those for linking instead of file path. (Copilot should be able to handle it well when you do make this initiative)

Comment thread docs/tutorial/index.rst Outdated
Comment thread docs/tutorial/upgrade-product-module.md Outdated
Comment thread docs/tutorial/upgrade-product-module.md Outdated
Comment thread docs/tutorial/upgrade-product-module.md Outdated
Comment thread docs/tutorial/upgrade-product-module.md Outdated
Comment thread docs/tutorial/upgrade-product-module.md Outdated
Comment on lines +33 to +38
```diff
+ http = {
+ source = "hashicorp/http"
+ version = "~> 3.0"
+ }
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to sepcify http here? Can it be a derived dependency from the charmhub module?

Comment thread docs/how-to/deploy-and-manage/refresh-product-module.md
Comment thread terraform/charmhub/main.tf Outdated
@MichaelThamm
Copy link
Copy Markdown
Contributor Author

Hi @YanisaHS, could you please review this PR with me? I would appreciate some opinions on the clarity and structure of the document.

@YanisaHS
Copy link
Copy Markdown
Contributor

@MichaelThamm Yea, I'd be happy to! 😊 I likely won't have time to review it until Tuesday though (I'm off Monday)

@MichaelThamm
Copy link
Copy Markdown
Contributor Author

@MichaelThamm Yea, I'd be happy to! 😊 I likely won't have time to review it until Tuesday though (I'm off Monday)

@YanisaHS This PR is still a work in progress. It may be best to wait until I make some more progress before reviewing this PR. The docs still need updating before first review.

@YanisaHS
Copy link
Copy Markdown
Contributor

@MichaelThamm Ok, sounds good! Just tag me when you're ready again

@MichaelThamm
Copy link
Copy Markdown
Contributor Author

MichaelThamm commented Jan 21, 2026

@YanisaHS

I think it is ready for review now: docs link

Some input I would like from you is:

  • This feature (updating channel updates charms to latest revision) is not guaranteed to work across tracks. In the tutorial, it shows 2/stable to 2/edge which is within the same track so it should be guaranteed to work.
  • Would you convert this into a How-to instead of a tutorial?

I need to add a comment to the doc about not guaranteed to work across tracks.

@YanisaHS
Copy link
Copy Markdown
Contributor

@MichaelThamm Ok! I'll review it soon (was off the end of last week)

@YanisaHS
Copy link
Copy Markdown
Contributor

@MichaelThamm I'll go through and actually provide feedback in the doc, but first addressing your two issues:

This feature (updating channel updates charms to latest revision) is not guaranteed to work across tracks. In the tutorial, it shows 2/stable to 2/edge which is within the same track so it should be guaranteed to work.

For a tutorial, this is fine. I'd still recommend you add a note (as you suggested) describing this condition, mostly for any driveby users / people who find this without realizing they're in a Diataxis ✨ tutorial ✨. Tutorials are more of a "sandboxed" experience, so it's okay to feed the user the configurations they need.

But...

Would you convert this into a How-to instead of a tutorial?

Yes. The Tutorials section in the Observability docs gives the docs in order a new user should follow, so now it reads like (1) Deploy, (2) Refresh, (3) Sync .. etc. The experience feels odd for a new user - if it's important they're on this track, then that should be given to them in the previous Deploy steps, rather than deploying and refreshing right after.

Actually this is something we'll discuss as a team this cycle: From a user's perspective, one of the most immediate concerns I have with the docs is that they don't clearly state how to deploy COS in the How-to guides (examples: Charmed Kafka, Charmed Postgres, Charmed K8s). When this is the first thing a user needs to do to access the product (and rest of the documentation).

I'll provide feedback with this in mind as I go through your PR - assuming you'll refactor it into a how-to guide.

Copy link
Copy Markdown
Contributor

@YanisaHS YanisaHS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly just looking at the Refresh product module doc, although I quick scanned the README and didn't have any comments. If you want me to provide feedback on something else in the PR then LMK

@@ -173,6 +173,8 @@ $ juju deploy cos-lite \
--overlay ./storage-small-overlay.yaml
```

(deploy-cos-ref)=
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an anchor, but it can also be used as a cross-sphinx/project ref for projects that have that enabled.

You haven't started using them in the COS docs yet, but it is the standard approach to cross-linking for sphinx docs projects, so at some point, you should change your links to use these refs. Juju uses them in their docs, but I don't know what conversations happened around it

Using ref targets/anchors is nicer because otherwise anytime you change the filename/path, it'll break any of those links in your docs.

IMO, it could go either way in this PR (add it or don't add it), but the future goal should be that all docs have a ref target, and you use those for linking instead of file path. (Copilot should be able to handle it well when you do make this initiative)

@@ -0,0 +1,111 @@
# Refresh COS to a new channel

In this example, you will learn how to deploy COS Lite and refresh from channel `2/stable` to `2/edge`. To do this, we can deploy COS Lite via Terraform in the same way as [in the tutorial](https://documentation.ubuntu.com/observability/track-2/tutorial/installation/cos-lite-microk8s-sandbox).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this gets refactored into a tutorial, the language/framing should change so it's not like "In this example, you'll learn...". How-to guides aren't really "learning" experiences, it's more just like "here's the steps you need to do XYZ".

Example: Charmed Kafka: How to upgrade


- Know how to deploy {ref}`COS Lite with Terraform <deploy-cos-ref>`

## Introduction
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to a comment I made above, this section can be less narrative if being refactored into a how-to guide

(btw I'm happy to meet and discuss my comments with you if you want!)

terraform apply
```

At this point, you will have successfully upgraded COS Lite from `2/stable` to `2/edge`!
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steps overall seem fine and pretty straightforward. I can provide some better feedback once it gets refactored into a how-to

MichaelThamm and others added 9 commits March 22, 2026 19:29
* Separates the storage directives for different worker roles

Signed-off-by: Bartlomiej Gmerek <bartlomiej.gmerek@canonical.com>

* chore: TODOs for tests

* Separates storage directives for Tempo workers

Signed-off-by: Bartlomiej Gmerek <bartlomiej.gmerek@canonical.com>

* Separates storage directives for Tempo workers

Signed-off-by: Bartlomiej Gmerek <bartlomiej.gmerek@canonical.com>

* Cleans up after testing

Signed-off-by: Bartlomiej Gmerek <bartlomiej.gmerek@canonical.com>

---------

Signed-off-by: Bartlomiej Gmerek <bartlomiej.gmerek@canonical.com>
Co-authored-by: Michael Thamm <mike.thamm@canonical.com>
This PR fixes #188

Signed-off-by: Jose C. Massón <939888+Abuelodelanada@users.noreply.github.com>
@MichaelThamm MichaelThamm changed the title feat: Charmhub module for upgrades without revision pins feat: Cross-track upgrades Mar 23, 2026
@MichaelThamm
Copy link
Copy Markdown
Contributor Author

MichaelThamm commented Apr 22, 2026

When trying to upgrade, I was getting this output:

module.cos-lite.module.grafana.juju_application.grafana: Modifications complete after 2s  ← provider returned, Juju still refreshing
module.cos-lite.juju_integration.grafana_ingress[0]: Creating...                          ← immediately tries new endpoint
│ Error: no relations found                                                                ← old charm still active

Summarizing my call with @adhityaravi:

  1. We can try to backport (into an ephemeral branch off of track/2) the terraform_data for the juju_charm.resources and/or juju_charm.requires["ingress"] for both COS Lite and Grafana TF modules and reference those in the itest and retry to test that the lifecycle action actually takes place.
  2. resource "juju_integration" "grafana_ingress" -> Unable to create integration, got error: no relations found
    • is this a timing issue with Terraform bc Grafana actually becomes dev/edge, but is in Error state. I was able to manually relate Grafana:ingress to Traefik:ingress, which indicates this could be a timing issue.

Comment thread terraform/cos-lite/variables.tf Outdated
Comment thread terraform/cos/variables.tf Outdated
Comment thread terraform/cos-lite/variables.tf Outdated
Comment thread terraform/cos-lite/upgrades.tf Outdated
@MichaelThamm
Copy link
Copy Markdown
Contributor Author

MichaelThamm commented Apr 24, 2026

Testing COS Lite from track/2 -> dev/edge

terraform {
  required_version = ">= 1.5"
  required_providers {
    juju = {
      source  = "juju/juju"
      version = "~> 1.0"
    }
  }
}

resource "juju_model" "cos-lite" {
  name = "cos-lite"
}

module "cos-lite" {
  source = "git::https://github.com/canonical/observability-stack//terraform/cos-lite?ref=track/2"
  channel      = "2/stable"
  model_uuid   = juju_model.cos-lite.uuid
  internal_tls = false
}
❯ tf init
❯ tf apply
Apply complete! Resources: 32 added, 0 changed, 0 destroyed.
image

then update the module and apply:

module "cos-lite" {
  source = "git::https://github.com/canonical/observability-stack//terraform/cos-lite?ref=test/tf-lifecycle"
  channel      = "2/stable"
  model_uuid   = juju_model.cos-lite.uuid
  internal_tls = false
}
❯ tf init -upgrade
❯ tf apply
Terraform will perform the following actions:

  # module.cos-lite.terraform_data.grafana_ingress_interface will be created
  + resource "terraform_data" "grafana_ingress_interface" {
      + id               = (known after apply)
      + triggers_replace = "traefik_route"
    }

  # module.cos-lite.terraform_data.grafana_litestream_resource will be created
  + resource "terraform_data" "grafana_litestream_resource" {
      + id               = (known after apply)
      + triggers_replace = true
    }

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

then update the module & risk and apply:

module "cos-lite" {
  source       = "../../terraform/cos-lite"
  risk         = "edge"
  model_uuid   = juju_model.cos-lite.uuid
  internal_tls = false
}
❯ tf init -upgrade
❯ tf apply
Terraform will perform the following actions:

  # module.cos-lite.juju_integration.grafana_ingress[0] will be replaced due to changes in replace_triggered_by
  # (moved from module.cos-lite.juju_integration.grafana_ingress)
-/+ resource "juju_integration" "grafana_ingress" {
      ~ id         = "335d5e05-3b09-493c-8c30-d73fa780f21b:traefik:traefik-route:grafana:ingress" -> (known after apply)
        # (1 unchanged attribute hidden)

      - application { # forces replacement
          - endpoint = "traefik-route" -> null
          - name     = "traefik" -> null
        }
      + application { # forces replacement
          + endpoint = "ingress"
          + name     = "traefik"
        }

        # (1 unchanged block hidden)
    }

  # module.cos-lite.terraform_data.grafana_ingress_interface must be replaced
-/+ resource "terraform_data" "grafana_ingress_interface" {
      ~ id               = "ea3af7ce-9e26-5975-e091-a756d7835a08" -> (known after apply)
      ~ triggers_replace = "traefik_route" -> "ingress"
    }

  # module.cos-lite.module.grafana.juju_application.grafana will be replaced due to changes in replace_triggered_by
-/+ resource "juju_application" "grafana" {
      ~ id                 = "335d5e05-3b09-493c-8c30-d73fa780f21b:grafana" -> (known after apply)
      ~ machines           = [] -> (known after apply)
      ~ model_type         = "caas" -> (known after apply)
        name               = "grafana"
      ~ storage            = [
          - {
              - count = 1 -> null
              - label = "database" -> null
              - pool  = "kubernetes" -> null
              - size  = "1G" -> null
            },
        ] -> (known after apply)
        # (6 unchanged attributes hidden)

      ~ charm {
          ~ base     = "ubuntu@24.04" -> (known after apply)
          ~ channel  = "2/stable" -> "dev/edge"
            name     = "grafana-k8s"
          ~ revision = 180 -> 186
        }
    }

  # module.cos-lite.module.grafana.terraform_data.grafana_litestream_resource must be replaced
-/+ resource "terraform_data" "grafana_litestream_resource" {
      ~ id               = "b26c2ad5-4ee2-4a37-cf7d-85fd10ce77b4" -> (known after apply)
      ~ triggers_replace = true -> false
    }

Plan: 9 to add, 6 to change, 8 to destroy.

Apply complete! Resources: 8 added, 5 changed, 8 destroyed.
image

Warning

Although COS is now active/idle, there are some resources missing (specifically all the Grafana resources)!

terraform state rm 'module.cos-lite.juju_offer.grafana_dashboards'
Removed module.cos-lite.juju_offer.grafana_dashboards
Successfully removed 1 resource instance(s).

terraform apply
Apply complete! Resources: 9 added, 0 changed, 0 destroyed.
tf apply

@MichaelThamm MichaelThamm changed the title feat: Cross-track upgrades feat: Cross-track upgrade documentation Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants